Search CORE

4 research outputs found

Efficient and secure document similarity search cloud utilizing mapreduce

Author: Alewiwi Mahmoud Khaled
Publication venue
Publication date: 01/01/2015
Field of study

Document similarity has important real life applications such as finding duplicate web sites and identifying plagiarism. While the basic techniques such as k-similarity algorithms have been long known, overwhelming amount of data, being collected such as in big data setting, calls for novel algorithms to find highly similar documents in reasonably short amount of time. In particular, pairwise comparison of documents sharing a common feature, necessitates prohibitively high storage and computation power. The wide spread availability of cloud computing provides users easy access to high storage and processing power. Furthermore, outsourcing their data to the cloud guarantees reliability and availability for their data while privacy and security concerns are not always properly addressed. This leads to the problem of protecting the privacy of sensitive data against adversaries including the cloud operator. Generally, traditional document similarity algorithms tend to compare all the documents in a data set sharing same terms (words) with query document. In our work, we propose a new filtering technique that works on plaintext data, which decreases the number of comparisons between the query set and the search set to find highly similar documents. The technique, referred as ZOLIP algorithm, is efficient and scalable, but does not provide security. We also design and implement three secure similarity search algorithms for text documents, namely Secure Sketch Search, Secure Minhash Search and Secure ZOLIP. The first algorithm utilizes locality sensitive hashing techniques and cosine similarity. While the second algorithm uses the Minhash Algorithm, the last one uses the encrypted ZOLIP Signature, which is the secure version of the ZOLIP algorithm. We utilize the Hadoop distributed file system and the MapReduce parallel programming model to scale our techniques to big data setting. Our experimental results on real data show that some of the proposed methods perform better than the previous work in the literature in terms of the number of joins, and therefore, speed

Sabanci University Research Database

A Unified Framework for Secure Search Over Encrypted Cloud Data

Author: Cengiz Orencik
Erkay Savas
Mahmoud Alewiwi
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 28/05/2017
Field of study

This paper presents a unified framework that supports different types of privacy-preserving search queries over encrypted cloud data. In the framework, users can perform any of the multi-keyword search, range search and k-nearest neighbor search operations in a privacy-preserving manner. All three types of queries are transformed into predicate-based search leveraging bucketization, locality sensitive hashing and homomorphic encryption techniques. The proposed framework is implemented using Hadoop MapReduce, and its efficiency and accuracy are evaluated using publicly available real data sets. The implementation results show that the proposed framework can effectively be used in moderate sized data sets and it is scalable for much larger data sets provided that the number of computers in the Hadoop cluster is increased. To the best of our knowledge, the proposed framework is the first privacy-preserving solution, in which three different types of search queries are effectively applied over encrypted data

Cryptology ePrint Archive

Secure sketch search for document similarity

Author: Alewiwi Mahmoud Khaled
Orencik Cengiz
Savas Erkay
Savaş Erkay
Örencik Cengiz
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/08/2015
Field of study

Document similarity search is an important problem that has many applications especially in outsourced data. With the wide spread of cloud computing, users tend to outsource their data to remote servers which are not necessarily trusted. This leads to the problem of protecting the privacy of sensitive data. We design and implement two secure similarity search schemes for textual documents utilizing locality sensitive hashing techniques for cosine similarity. While the first one provides very fast search time results and a decent level of privacy, the second method enjoys enhanced security properties such as hiding the search and access patterns but with higher latency

Crossref

Sabanci University Research Database

Efficient top-k similarity document search utilizing distributed file systems and cosine similarity

Author: Alewiwi Mahmoud Khaled
Orencik Cengiz
Savas Erkay
Savaş Erkay
Örencik Cengiz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/11/2015
Field of study

Sabanci University Research Database